Goto

Collaborating Authors

 relevant action



Excluding the Irrelevant: Focusing Reinforcement Learning through Continuous Action Masking

Neural Information Processing Systems

Continuous action spaces in reinforcement learning (RL) are commonly defined as multidimensional intervals. While intervals usually reflect the action boundaries for tasks well, they can be challenging for learning because the typically large global action space leads to frequent exploration of irrelevant actions. Yet, little task knowledge can be sufficient to identify significantly smaller state-specific sets of relevant actions. Focusing learning on these relevant actions can significantly improve training efficiency and effectiveness. In this paper, we propose to focus learning on the set of relevant actions and introduce three continuous action masking methods for exactly mapping the action space to the state-dependent set of relevant actions. Thus, our methods ensure that only relevant actions are executed, enhancing the predictability of the RL agent and enabling its use in safety-critical applications. We further derive the implications of the proposed methods on the policy gradient. Using proximal policy optimization ( PPO), we evaluate our methods on four control tasks, where the relevant action set is computed based on the system dynamics and a relevant state set. Our experiments show that the three action masking methods achieve higher final rewards and converge faster than the baseline without action masking.



Excluding the Irrelevant: Focusing Reinforcement Learning through Continuous Action Masking

Neural Information Processing Systems

Continuous action spaces in reinforcement learning (RL) are commonly defined as multidimensional intervals. While intervals usually reflect the action boundaries for tasks well, they can be challenging for learning because the typically large global action space leads to frequent exploration of irrelevant actions. Yet, little task knowledge can be sufficient to identify significantly smaller state-specific sets of relevant actions. Focusing learning on these relevant actions can significantly improve training efficiency and effectiveness. In this paper, we propose to focus learning on the set of relevant actions and introduce three continuous action masking methods for exactly mapping the action space to the state-dependent set of relevant actions.


Causality-Driven Reinforcement Learning for Joint Communication and Sensing

Roy, Anik, Banerjee, Serene, Sadasivan, Jishnu, Sarkar, Arnab, Dey, Soumyajit

arXiv.org Artificial Intelligence

The next-generation wireless network, 6G and beyond, envisions to integrate communication and sensing to overcome interference, improve spectrum efficiency, and reduce hardware and power consumption. Massive Multiple-Input Multiple Output (mMIMO)-based Joint Communication and Sensing (JCAS) systems realize this integration for 6G applications such as autonomous driving, as it requires accurate environmental sensing and time-critical communication with neighboring vehicles. Reinforcement Learning (RL) is used for mMIMO antenna beamforming in the existing literature. However, the huge search space for actions associated with antenna beamforming causes the learning process for the RL agent to be inefficient due to high beam training overhead. The learning process does not consider the causal relationship between action space and the reward, and gives all actions equal importance. In this work, we explore a causally-aware RL agent which can intervene and discover causal relationships for mMIMO-based JCAS environments, during the training phase. We use a state dependent action dimension selection strategy to realize causal discovery for RL-based JCAS. Evaluation of the causally-aware RL framework in different JCAS scenarios shows the benefit of our proposed framework over baseline methods in terms of the beamforming gain.


Excluding the Irrelevant: Focusing Reinforcement Learning through Continuous Action Masking

Stolz, Roland, Krasowski, Hanna, Thumm, Jakob, Eichelbeck, Michael, Gassert, Philipp, Althoff, Matthias

arXiv.org Artificial Intelligence

Continuous action spaces in reinforcement learning (RL) are commonly defined as interval sets. While intervals usually reflect the action boundaries for tasks well, they can be challenging for learning because the typically large global action space leads to frequent exploration of irrelevant actions. Yet, little task knowledge can be sufficient to identify significantly smaller state-specific sets of relevant actions. Focusing learning on these relevant actions can significantly improve training efficiency and effectiveness. In this paper, we propose to focus learning on the set of relevant actions and introduce three continuous action masking methods for exactly mapping the action space to the state-dependent set of relevant actions. Thus, our methods ensure that only relevant actions are executed, enhancing the predictability of the RL agent and enabling its use in safety-critical applications. We further derive the implications of the proposed methods on the policy gradient. Using Proximal Policy Optimization (PPO), we evaluate our methods on three control tasks, where the relevant action set is computed based on the system dynamics and a relevant state set. Our experiments show that the three action masking methods achieve higher final rewards and converge faster than the baseline without action masking.


Language Guided Exploration for RL Agents in Text Environments

Golchha, Hitesh, Yerawar, Sahil, Patel, Dhruvesh, Dan, Soham, Murugesan, Keerthiram

arXiv.org Artificial Intelligence

Real-world sequential decision making is characterized by sparse rewards and large decision spaces, posing significant difficulty for experiential learning systems like $\textit{tabula rasa}$ reinforcement learning (RL) agents. Large Language Models (LLMs), with a wealth of world knowledge, can help RL agents learn quickly and adapt to distribution shifts. In this work, we introduce Language Guided Exploration (LGE) framework, which uses a pre-trained language model (called GUIDE ) to provide decision-level guidance to an RL agent (called EXPLORER). We observe that on ScienceWorld (Wang et al.,2022), a challenging text environment, LGE outperforms vanilla RL agents significantly and also outperforms other sophisticated methods like Behaviour Cloning and Text Decision Transformer.


Create a Bot to Find Diamonds in Minecraft

#artificialintelligence

Minecraft is the next frontier for Artificial Intelligence. It takes an entire wiki with over 8000 pages just to teach humans how to play Minecraft. So how good can be artificial intelligence? This is the question we'll answer in this article. We'll design a bot and try to achieve one of the most difficult challenges in Minecraft: finding diamonds from scratch.


When Planning Should Be Easy: On Solving Cumulative Planning Problems

Bartak, Roman (Charles University in Prague) | Dvorak, Filip (Charles University in Prague) | Gemrot, Jakub (Charles University in Prague) | Brom, Cyril (Charles University in Prague) | Toropila, Daniel (Charles University in Prague)

AAAI Conferences

This paper deals with planning domains that appear in computer games, especially when modeling intelligent virtual agents. Some of these domains contain only actions with no negative effects and are thus treated as easy from the planning perspective. We propose two new techniques to solve the problems in these planning domains, a heuristic search algorithm ANA* and a constraint-based planner RelaxPlan, and we compare them with the state-of-the-art planners, that were successful in IPC, using planning domains motivated by computer games.


The Influence of k- Dependence on the Complexity of Planning

Gimenez, Omer (Universitat Politecnica de Catalunya) | Jonsson, Anders (Universitat Pompeu Fabra)

AAAI Conferences

A planning problem is k- dependent if each action has at most k pre-conditions on variables unaffected by the action. This concept is well-founded since k is a constant for all but a few of the standard planning domains, and is known to have implications for tractability. In this paper, we present several new complexity results for P ( k ), the class of k- dependent planning problems with binary variables and polytree causal graphs. The problem of plan generation for P ( k ) is equivalent to determining how many times each variable can change. Using this fact, we present a polytime plan generation algorithm for P (2) and P (3). For constant k > 3, we introduce and use the notion of a cover to find conditions under which plan generation for P ( k ) is polynomial.